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The Relationship Between Instructional and Classroom Assessment Practices of 
Elementary Teachers and Student Scores on High-Stakes Tests 

This study sought to determine the relationships between teacher self-reported 

instructional and classroom assessment practices and scores on a state high-stakes 

test. Seventy-nine fifth teachers participated. Average mathematics and reading test scale scores 

of students in each class were used as dependent variables, using a measure of aptitude as a 
covariate. Overall, there were few relationships, suggesting that many variations in instruction and 
assessment are related to high achievement. There was some evidence to suggest that use of 
cooperative learning and small groups, direct teaching, the use of formative assessment, and use of 
essay tests showed small positive relationships to achievement. Few differences were noted 
between mathematics and reading. Implications for improving external high-stakes tests are 
discussed. 
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As the importance of large-scale assessments has risen, so too has the impact of these tests 
on classroom teaching and assessment practices. While many believe that the consequences have 
been positive, previous research supports the conclusion that there are significant negative effects 
of high-stakes testing on teaching and learning. As noted by assessment expert Lorrie Shepard, "it 
is important to recognize the pervasive negative effects of accountability tests and the extent to 
which externally imposed testing programs prevent and drive out thoughtful classroom practices" 
(Shepard, 2000, p.9). Preliminary research on the consequences of high-stakes testing has 
suggested that it may de-professionalize teaching, increase rote memorization, narrow the 
curriculum, promote classroom assessment practices that mirror the format of the tests, and 
encourage a direct style of teaching (Darling-Hammond, 1988; McMillan, 2001; Shepard, 2000), 
but these findings are based mostly on national-level standardized tests. Cizek (2001), on the other 
hand, suggests several positive consequences, including increased use of student performance data 
to evaluate programs, increased knowledge about testing, renewed interest in teacher 
professionalism, and increased student learning. 

It has been noted that the objective format and psychometric principles of large-scale 
testing conflict with implications for teaching and learning derived from contemporary views of 
learning represented by cognitive and constructivist paradigms (McMillan, in press; Shepard, 
2000). As a result, teachers' decision-making in the classroom may be conflicted. What 
constructivist theories promote about student learning and teaching, such as authentic learning, 
deep understanding, intrinsic motivation, and constructed-response and student formative self- 
assessment, tend to conflict with pressures to align teaching and classroom assessments with 
external tests that emphasize simple understanding, decontextualized tasks, and selected-response 
tests. Some have suggested that high-stakes tests have influenced teachers' decisions to leave the 
profession. 
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While a number of studies document these impacts, most evidence is anecdotal, and 
rarely has there been in-depth research to fully understand how methods of instruction, classroom 
assessment, and student performance on high-stakes tests are related. Too often, what are voiced 
are purported positive or negative relationships, often depending on political or philosophical 
positions, without solid evidence. It is often argued that teachers are not mandated to use particular 
instructional strategies, that accountability is only for student outcomes, but there is little evidence 
about how teachers effectively incorporate instructional approaches with the pressures of external 
testing. 

In an early study of the effects of high-stakes testing on instruction, Smith (1991) found 
that teachers used more worksheets and less hands-on instruction. A survey of 236 elementary 
teachers in North Carolina (Jones, Jones, Hardin, Chapman, Yarbrough, & Davis, 1999) found that 
67% had changed their instructional methods as a result of high-stakes testing. The nature of the 
change was mixed, with approximately equal numbers using more and fewer inquiry projects and 
worksheets, with increases in using hands-on activities, group activities, and student-centered 
instruction. Firestone, Monfils, and Camilli (2001) conducted a survey of 287 fourth grade 
teachers in New Jersey and found that state-level testing and accountability demands were not 
sufficient to impact teaching and that most teachers have not dramatically changed practices. 
Rather, changes in instruction depended on local support and pressure. 

What is lacking in most of the past research on the effects of high-stakes testing is 
empirical evidence that relates instructional and classroom assessment practices to actual test 
scores. The current study examines these relationships using test data for students in 79 Virginia 
elementary classrooms. It is also clear that teachers and administrators are pressured to raise test 
scores, but there is no evidence that particular instructional or classroom assessment strategies will 
increase accountability test scores. This study takes one step in investigating this issue. 



O 

ERIC 



5 



4 

Furthermore, it is important to account for student ability in examining standardized test score 
data so that established relationships are not confounded by entering student achievement and 
ability. 

Two specific research questions were investigated: 

What is the relationship between instructional methods emphasized and Standards of 
Learning (SOL) test scores? 

What is the relationship between classroom assessment practices and SOL test scores? 

Methodology 

Sample 

The convenience sample included 79 fifth grade teachers from 29 K-5 elementary schools in a 
suburban school district. The district is socially and economically diverse. 

Instruments 

Instruments for the study included the SOL measures for the dependent variables, Stanford 
Achievement Test scores as a measure of student ability, and a teacher survey of instructional and 
classroom assessment practices. In Virginia, SOL tests are administered to every fifth grade 
student in May. This study utilized the math and reading/language arts tests, which are separately 
administered 50 item multiple choice tests. The average scale score of fifth grade teachers for math 
and reading/language arts was calculated and used in the analyses. Stanford 9 reading and math 
scores for each student were obtained during the fall of the fourth grade and averaged for the fifth 
grade students in each class to provide a proxy for student ability. 

The survey data were collected by teacher self-report in early June. There were six items that 
measured instructional practices and 13 items that focused on classroom assessment practices. The 
six instructional practice items were based in part on scales derived from research by Monfils, 
Camilli, Firestone, and Mayrowetz (2000), in which validity evidence was found to support 
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collaborative learning, active learning (constructivist approaches), and traditional teaching (direct 
instruction and independent seat work) as separate instructional components. Two additional items 
were added to the four generated from this research to focus on whether instruction more generally 
was focused on the state standards and SOL tests. Additional survey items on assessment were 
based on earlier research by McMillan (2002) that focused on the classroom assessment practices 
of elementary teachers. In that study evidence for validity supported four types of assessment 
practices (objective, essay, portfolios, and authentic), different cognitive levels assessed (recall or 
deep understanding), the extent to which teachers constructed their own assessments or used 
assessments supplied to them, and whether classroom assessments were aligned with the SOL tests. 

The Likert-type scale used in McMillan (2002) was modified slightly for all questions. 

Teachers were asked to indicate the extent to which they had used each practice, separately for 
math and English/language arts. 

Findings 

The percentages of teachers responding to each point on the scale, means, and standard 
deviations are summarized in Tables 1 and 2. For English/language arts, direct instruction was 
clearly used most, with mean of 4.22 and 83% of the teachers indicating that they used this 
approach "quite a bit" or "extensively." Cooperative and small group activities and constructivist 
teaching methods were used more than independent seat or class work, with means of 3.96, 

3.83and 3.67, respectively, with only slightly lower percentages of teachers using these approaches 
"quite a bit" or "extensively." As would be expected, instruction was focused heavily on the SOL 
and SOL tests. The results for math instruction were essentially the same. 

For English/language arts assessment practices objective tests are used much more than essay, 
informal, performance, authentic, or portfolios (means 4.54, 4.01, 3.58, 3.33, 3.47, and 2.78, 
respectively). Teachers emphasized assessments that measure recall knowledge and deep 
understanding about the same, with more than 80% responding "quite a bit" or "extensively." Only 
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61% indicated "quite a bit" or "extensively" for assessments requiring student explanations. 
Classroom assessments were heavily focused on the SOL and SOL tests. Similar patterns of results 
were found in math, with the exception of essays, which were used much less. 

Table 3 presents partial correlations between teacher responses and SOL test score results, with 
Stanford 9 scores used as covariates. For English/language arts only one correlation was 
statistically significant, which showed a positive relationship between the extent to which 
cooperative and small group activities were used and SOL scores (r=.34). The emphasis on direct 
instruction approached significance (r=.22). Other instructional variables were unrelated to SOL 
scores. In math, none of the instructional variables were related to SOL scores. 

For classroom assessment practices none of the relationships were statistically significant. Two 
positive correlations approached significance for English/language arts, the extent to which essay 
tests and informal assessments were used. In math, extent of use of essay exams approached a 
positive statistically significant relationship, while the use of supplied assessments approached a 
significant negative correlation. 

Two factors make it difficult to obtain significant relationships in this study. First, the high 
mean scores and small standard deviations on about half the items resulted in restricted ranges that 
make it difficult to establish statistically significant correlations. Given this limitation and the 
moderate sample size it is likely that there are some true relationships that were unable to be 
documented. Second, high correlations existed between the Stanford 9 and SOL scores (English 
.86; math .84). This suggests that a high percentage of the variability in SOL scores is accounted 
for by student ability, leaving little variation that can be explained by instructional and classroom 
assessment practices. 
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This study is limited to teacher perceptions of classroom practices and sample characteristics 
of primarily a suburban school district. It is possible that observational data would provide 
different results since some teachers may want to respond in socially desirable ways. Even though 
the survey was anonymous, the perceptions may not reflect realities of what has occurred in the 
classes. 

Given this limitation, the findings suggest that while there may be some relationships between 
instructional and classroom assessment practices and student achievement on high-stakes tests, 
these relationships are both few and small. Many of the factors did not show any relationships with 
test scores, which suggests that variations in instructional practices and classroom assessments may 
not be responsible for differences on test scores. This finding is consistent with Firestone et al. 
(2001), and may mean that high high-stakes test results can be achieved with a variety of teaching 
methods and assessments. It may be that the quality of the delivery of different practices is more 
important than the approach is direct and assessment. That is, there may be several methods or 
approaches that can be used to result in high scores. 

There were also some trends in the relationships that are important. Consistent with much 
research, there was a positive relationship between the use of cooperative and small group 
instruction and English test scores. This may mean that teachers who tend to use these techniques 
more will have higher test scores. The trend that showed a positive relationship between direct 
teaching and test scores supports what many see as a detrimental impact of objective high stakes 
tests. For classroom assessment practices, it is interesting that more use of essay tests for both 
English and math was related to higher objective test scores. It may be that essay tests require 
student learning that is consistent with the cognitive level of the tests. Even with the statistical 
adjustment for ability, it may also be that students in classes that emphasize essays more are in 
general more capable than students in classes that use less essay assessment. 
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The finding of a positive relationship between the use of informal, formative assessment and 
test scores is consistent with recent research reported by Black & Wiliam (1998). As pointed out 
by Stiggins (2002) and Brookhart (2001), teacher use of formative assessments and frequent, 
specific and descriptive feedback to students, is supported by recent cognitive learning and 
motivation theories. Even though these findings are primarily correlational, this may suggest an 
important way for teachers to have direct control on a factor that may enhance high-stakes test 
results. 

What can teachers and administrators do to improve students' scores on high-stakes tests? This 
study suggests that moderate impacts can be made by some practices, but much research is needed 
to establish relationships between instructional and assessment practices and test scores. 
Specifically, there is a need to measure both instructional and assessment practices in ways that 
provide more variability so that there is greater sensitivity to measure relationships. There is also a 
need to provide better measures of student ability so that the unique contributions of teaching and 
assessment can be determined. 
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Table 1 

Percentages (rounded), Means and Standard Deviations of Fifth Grade 
Teacher Responses To English/Language Questions 



N=79 



To what extent did you use: 



Not at Very Quite Exten- 

All Little Some a Bit sively Mean S.D. 



1 . assessments that measure student recall knowledge or 
simple comprehension 

2. assessments that measure student deep understanding, 
reasoning, and/or application 

3. objective assessments (e.g., multiple choice, matching, 
short answer) 

4. assessments supplied to you (e.g., from publishers, 
software, instructor's guide, division) 

5. assessments constructed by yourself 

6. performance assessments (e.g., structured teacher 
observations or ratings of performance such as a project, 
speech or paper) 

7. essay-type assessments 

8. portfolios 



3 16 29 53 4.32 .83 



16 43 42 4.26 .72 



1 6 29 63 4.54 .68 



3 17 22 27 31 3.68 1.16 



1 13 41 45 4.29 .74 



1 9 39 33 18 3.58 .93 



4 11 46 27 13 3.33 .97 



15 27 34 13 11 2.78 1.20 



9. direct instruction (i.e., structured, systematic teaching in 




4 13 41 42 4.22 .82 



which information is presented to students with review and 
practice; teacher-centered) 
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1 0. assessments aligned closely to the SOL goals and 
objectives 

1 1 . assessments aligned closely to SOL test(s) 

12. informal assessments that provide immediate feedback to 
students 

13. constructivist teaching methods (e.g., active learning, 
contextualized, creating personal meaning for students, 
student-centered, discovery, reflection) 

14. instruction focused on the SOL goals and objectives 

15. instruction focused on the SOL test(s) 

16. assessments or instruction that required student 
explanations of work, conclusions, opinions, ideas, and/or 
answers 

1 7. authentic assessments (i.e., questions based on “real world” 
problems or materials) 

18. cooperative and/or small group learning activities 



11 87 4.85 .46 



8 22 69 4.59 .69 



27 44 29 4.01 .75 



1 7 29 35 29 3.83 .97 



17 84 4.84 .37 



1 — 11 20 67 4.52 .80 



4 5 30 41 20 3.68 .98 



3 10 32 49 6 3.47 .86 



3 28 41 29 3.96 .82 



19. independent seat or class work with assignments and/or — 1 1 33 33 23 3.67 .96 
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worksheets 



Table 2 



Percentages (rounded), Means and Standard Deviations of Fifth Grade 
Teacher Responses To Math Questions 



N=79 



To what extent did you use: 



Not at Very Quite Exten- 

All Little Some a Bit sively Mean S.D. 



1. assessments that measure student recall knowledge or — 3 15 29 54 4.33 .83 

simple comprehension 



2. assessments that measure student deep understanding, — — 10 41 49 4.39 .67 

reasoning, and/or application 



3. objective assessments (e.g., multiple choice, matching, 1 1 4 30 63 4.52 .77 

short answer) 



4. assessments supplied to you (e.g., from publishers, 3 9 23 26 39 3.90 1.11 

software, instructor’s guide, division) 



5. assessments constructed by yourself — 3 16 39 43 4.21 .82 

6. performance assessments (e.g., structured teacher 3 14 40 24 19 3.41 1.04 

observations or ratings of performance such as a 

project, speech or paper) 

7. essay-type assessments 16 27 34 14 9 2.73 1.15 

8. portfolios 27 29 24 14 6 2.43 1.20 



9. direct instruction (i.e., structured, systematic teaching — 4 14 39 43 4.20 .84 

in which information is presented to students with 
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review and practice; teacher-centered) 



1 0. assessments aligned closely to the SOL goals and 
objectives 

1 1 . assessments aligned closely to SOL test(s) 

1 2. informal assessments that provide immediate feedback 
to students 

1 3. constructivist teaching methods (e.g., active learning, 
contextualized, creating personal meaning for 
students, student-centered, discovery, reflection) 

14. instruction focused on the SOL goals and objectives 

1 5. instruction focused on the SOL test(s) 

1 6. assessments or instruction that required student 
explanations of work, conclusions, opinions, ideas, 
and/or answers 

1 7. authentic assessments (i.e., questions based on “real 
world” problems or materials) 

1 8. cooperative and/or small group learning activities 

1 9. independent seat or class work with assignments 



9 92 4.92 .28 



10 20 70 4.61 .67 



20 44 36 4.16 .74 



7 30 35 28 3.83 .92 



14 86 4.86 .35 



1 1 20 69 4.58 .69 



1 7 35 38 18 3.65 .91 



1 6 27 54 13 3.70 .82 



3 28 38 31 3.97 .85 



13 31 34 23 3.66 .97 





and/or worksheets 



Table 3 



Partial Correlation Coefficients of the Relationship Between Fifth Grade Classroom Assessment 
and Instructional Practices and SOL Test Scores, Controlling for Student Ability 

N=79 



Assessment or Instructional Practice 

1 . assessments that measure student recall knowledge or simple 
comprehension 



SOL Test 
English Math 



.10 03 



2. assessments that measure student deep understanding, reasoning, and/or 
application 



.10 



.13 



3. objective assessments (e.g., multiple choice, matching, short answer) 



.15 .02 



4. assessments supplied to you (e.g., from publishers, software, instructor's guide, -.04 -.26* 

division) 



5. assessments constructed by yourself -.02 .18 



6. performance assessments (e.g., structured teacher observations or ratings of -.03 -. 1 1 

performance such as a project, speech or paper) 



7. essay-type assessments .25* .24* 
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portfolios 



-.01 



.03 



li 

9. direct instruction (i.e., structured, systematic teaching in which information is .22* -.1 1 

presented to students with review and practice; teacher-centered) 



10. assessments aligned closely to the SOL goals and objectives .22* -.12 



11. assessments aligned closely to SOL test(s) -.06 .00 

12. informal assessments that provide immediate feedback to students .23* .02 

13. constructivist teaching methods (e.g., active learning, contextualized, creating .03 .00 

personal meaning for students, student-centered, discovery, reflection) 

14. instruction focused on the SOL goals and objectives .16 .00 

15. instruction focused on the SOL tesi(s) .03 -.02 



1 6. assessments or instruction that required student explanations of work, .19 .15 

conclusions, opinions, ideas, and/or answers 

17. authentic assessments (i.e., questions based on “real world” problems or .12 .07 

materials) 



18. cooperative and/or small group learning activities .34** -.01 



19. independent seat or class work with assignments and/or worksheets -.05 -.16 



' P* 



.10;** p <;.01 
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